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Gravitational wave astronomy has tremendous potential for studying extreme astrophysical phe- 
nomena and exploring fundamental physics. The waves produced by binary black hole mergers 
will provide a pristine environment in which to study strong field, dynamical gravity. Extracting 
detailed information about these systems requires accurate theoretical models of the gravitational 
wave signals. If gravity is not described by General Relativity, analyses that are based on wave- 
forms derived from Einstein's field equations could result in parameter biases and a loss of detection 
efficiency. A new class of "parameterized post-Einsteinian" (ppE) waveforms has been proposed 
to cover this eventuality. Here we apply the ppE approach to simulated data from a network of 
advanced ground based interferometers (aLIGO/aVirgo) and from a future space based interferom- 
eter (LISA). Bayesian inference and model selection are used to investigate parameter biases, and 
to determine the level at which departures from general relativity can be detected. We find that in 
some cases the parameter biases from assuming the wrong theory can be severe. We also find that 
gravitational wave observations will beat the existing bounds on deviations from general relativity 
derived from the orbital decay of binary pulsars by a large margin across a wide swath of parameter 
space. 

PACS numbers: 04.80.Cc,04.80.Nn,04.30.-w,04.50.Kd 



I. INTRODUCTION 

Einstein's theory of gravity has been subject to a wide 
array of experimental tests and has passed them all with 
flying colors None of these tests, however, has probed 
the strong field, dynamical regime that pertains to the 
final inspiral and merger of compact objects. The Hulsc- 
Taylor binary pulsar PSR B1913+16 [f] and the double 
binary pulsar PSR J0737-3039A 0,13] have provided con- 
vincing evidence for the existence of gravitational waves, 
and have served as unique laboratories to test general 
relativity (GR), but these objects have relatively small 
orbital velocities, v/c ~ 10~ 3 , a mere factor of 10 faster 
than the Earth's orbit around the Sun. The parameter 
space covered by black hole mergers, where orbital veloci- 
ties d/c> 10~ 3 and can approach v/c ~ 0.7, is currently 
terra incognita - Dragons may yet lurk there. 

If not accounted for, the possibility that Einstein's the- 
ory of gravity may not correctly describe the production 
and propagation of gravitational waves could have dire 
consequences for gravitational wave astronomy. In the 
case of ground-based detectors, the detection of weak 
signals buried below the instrument noise requires ac- 
curate models of the gravitational waveforms. Errors in 
the modeling of these waveforms can lead to a loss in 
detection efficiency. When the signals are stronger, as 
will often be the case with space-based observations of 
black hole mergers, waveform templates will no longer 
be needed for detection, but a waveform model will be 
required to infer the physical parameters of the system, 
such as the masses and spins of the black holes, and the 



distance to the system. Waveform models based on an in- 
correct theory of gravity will lead to fundamental bias [f| 
in the recovered parameters. Because these waveforms 
would not accurately describe nature, the parameters 
that maximize the fit of such a waveform to data would 
not correspond to the true physical values of the system. 
This bias is distinct from that caused by imperfect mod- 
eling of GR, as explored in Q , as it reflects a fundamental 
lack of knowledge about the true nature of gravity, and 
not simply the use of inaccurate physical assumptions - 
see Q for more details. 

Turning the problem around, the discovery that Ein- 
stein's theory is flawed would be the greatest result to 
come out of gravitational wave astronomy [?J . This has 
served as the motivation for the development of a wide 
range of tests of GR that use gravitational wave observa- 
tions. These tests can be broadly classified as "extrinsic" 
or "intrinsic" . Extrinsic tests are possible when there is a 
concrete alternative theory, such as massive gravitons @- 
03, or Brans-Dicke theory d, 03 0343 ■ Intrinsic tests 
work within the confines of GR, and take the form of in- 
ternal consistency checks, such as measuring the multipo- 
lar structure of the metric [l7|, 03 , or multi- modal spec- 
troscopy of BH inspiral and ringdown waveforms fl9ll20|]. 
These tests are valuable, but they do not cover the full 
spectrum of possibilities. The existing extrinsic tests are 
limited by the lack of viable alternative models, while the 
intrinsic tests do not so much test GR, as "test the nature 
of massive compact bodies within GR" (to quote [21|). 

Convincing alternative models to GR are hard to find 
because none of the currently proposed alternatives can 
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satisfy key criteria that physicists would like to require. 
On the observational front, one wishes that any GR al- 
ternative passes all Solar System and binary pulsar tests 
with flying colors, only predicting deviations from GR in 
the strong-field regime, where tests are currently lac king . 
Many theories, such as Brans-Dicke theory P.JIg. [l3 - [l6| . 
are heavily constrained by this requirement On the 
theoretical front, one would wish viable GR alternatives 
to lead to well-posed theories, with a positive definite 
Hamiltonian and free of instabilities. All perturbative 
string theory and loop quantum gravity low-energy effec- 
tive theories [22|, [23| currently lead to higher-derivative 
theories, which might violate this theoretical criteria. 

The paucity of concrete alternative models to GR [24| 
has impacted other testing grounds, such as those based 
on solar system observations, or the aforementioned bi- 
nary pulsar systems. In those instances the standard 
approach has been to develop models that parameterize 
a wide class of possible departures from GR - the pa- 
rameterized post-Newtonian formalism [ 2 5 |- 28 1 and the 



29 . It is nat- 



parameterized post-Keplerian formalism 
ural to adopt the same strategy when analyzing grav- 
itational wave data, which leads to the parameterized 
post-Einsteinian (ppE) formalism introduced in Ref. [f|. 

To motivate this approach, consider the standard post- 
Newtonian (PN) expression for the dominant contribu- 
tion to the stationary phase waveform describing the 
Fourier transform of the time-domain gravitational wave 
strain signal of the inspiral of two non-spinning black 
holes on circular orbits (see e.g. fioj]): 



hanif) 
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where / is frequency, M = rf' 5 M is the chirp mass, 
M = mi + ?7i2 is the total mass, 77 = mim2/M 2 is 
the dimensionless, symmetric mass ratio, Dl is the lu- 
minosity distance and C is a geometric factor that de- 
pends on the relative orientation of the binary and the 
detector (its average for LISA is C = 2/5). The am- 
plitude A(f) and phase ^(/) are developed as a series 
in u = nMf = ?7 3 / 5 u 3 , where v is the relative velocity 
between the two bodies 1301 : 
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The coefficients 7^(77), ipkiv) an d "tpkiiv) are currently 
known up to k = 7 in the post-Newtonian expansion of 
GR. 

In the simplest proposal of Yunes and Pretorius Q, 
the phase and amplitude are modified by only one ppE 
term each, but as pointed out by the authors there is 



no reason to believe that an alternative theory of gravity 
will predict such a restricted deviation from GR. In view 
of this, Yunes and Pretorius proposed four different pa- 
rameterizations that differed in their level of complexity, 
one of the most complicated of which is (see Eq. (46) 
in 1) 
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l + Y, a ^ a ' ^gr(Z). 
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where the coefficients cti and (3i may depend on the sym- 
metric mass ratio r] (and in more general cases, also on 
the spin angular momenta and the difference between the 
two masses) and Aqr and V^gr are the standard expres- 
sions in Eqs. © and J3]). This is in essence the ppE 
approach. 

In an earlier study, Arun et. al. (3lT[33l | considered what 
can now be interpreted as a restricted version of the ppE 
formalism in which the exponents a% and hi are required 
to match those found in GR. This amounts to asking how 
well the standard PN expansion coefficients could be re- 
covered from gravitational wave observations. They also 
developed internal self-consistency checks based on the 
observation that each coefficient tpk (77) provides an inde- 
pendent estimate of the mass ratio 77. While interesting, 
these tests are limited in scope as few of the well known 
alternative theories of gravity (Brans-Dicke M, nft JT3— 
HH, Massive Graviton IsT -flil . Chern- Simons [22L |34M37| . 
Variable G (38|, TeVeS [39{ e7;c.) have corrections with 
exponents at and bi that match those of GR Q . The full 
ppE formalism allows us to look for a much wider and 
realistic set of possible departures from GR. 

Our goal here is to study how the ppE formalism can 
be used to search for waveform deviations from GR us- 
ing data from the next generation of ground based inter- 
ferometers (aLIGO/aVirgo) and future space based in- 
terferometers (e.g. LISA). Bayesian model selection is 
used to determine the level at which departures from 
GR can be detected (See Ref.fioj for a related study 
that uses Bayesian inference to study constraints on Mas- 
sive Graviton theories). Advanced Markov Chain Monte 
Carlo (MCMC) techniques are used to map out the pos- 
terior distributions for the models under consideration. 
From these distributions, we are able to quantify the de- 
gree of fundamental bias in parameter extraction, and in 
particular, if the fundamental bias can be significant in 
situations where there is no clear indication that there 
arc departures from GR. 

Recently, Pozzo et.al. [37] performed a similar study 
that applied Bayesian model selection to estimate the 
bounds that could be placed on massive graviton theory. 
As such, their work is a sub-case of the ppE framework, 
i.e. a particular choice of (b,/3). Their implementation 
differed from ours in that they used Nested Sampling 
while we used MCMC techniques, but as wc will show, 
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our results are in agreement with theirs for the relevant 
sub-case. 

We find that gravitational wave observations will al- 
low us to extend the existing bounds derived from pul- 
sar orbital decay [4l| into the region of parameter space 
that covers strong field departures from GR (a, > and 
b, > -5/3) (see Fig. dfl] in Sec. llVA]) . As expected, 
we find that the strength of the bounds on the ppE pa- 
rameters are inversely proportional to the signal-to-noisc 
ratio (SNR), and the extent to which deviations between 
GR templates and non-GR signals can be detected (the 
departure of the "fitting factor" from unity) scales as 
1/SNR 2 . The logarithm of the odds ratio used to de- 
cide if a signal is described by GR or some alternative 
theory follows the same 1/SNR 2 scaling. A more sur- 
prising result is the possibility of "stealth bias" whereby 
the parameters recovered using GR templates can be sig- 
nificantly biased even when the odds ratio shows no clear 
preference for adopting an alternative theory of gravity. 

The remainder of this paper is organized as follows. 
Section |TT] introduces the analysis framework in more de- 
tail, including a discussion of the waveform model, noise 
spectrum, and Bayesian tools used. Section HTT1 describes 
in detail the computational techniques used to to im- 
plement the analysis. Section IIVI presents the results of 
our analysis. Section |V] closes with a discussion of how 
our results might change as the degree of realism is in- 
creased, and identifies key questions to be addressed in 
future work. Throughout this paper we use geometric 
units with G — c = 1 . 



II. ANALYSIS FRAMEWORK 
A. Bayesian Inference 

Questions of model selection and parameter biases can 
be addressed very naturally in the framework of Bayesian 
inference. This approach is now well established in the 
field of gravitational wave data analysis, as are the tools 
used to carry out the analysis. To avoid unnecessary rep- 
etition, we will focus on those aspects of the analysis that 
are new, and refer the reader to Ref. [42[ for a detailed 
description of the techniques used. 

We are interested in comparing the hypothesis Hq that 
gravity is described by GR with the hypothesis Hi that 
gravity is described by an alternative theory belonging 
to the ppE class. Here we are dealing with nested hy- 
potheses, as the ppE models include GR as a limiting 
case. When new data d becomes available, our prior be- 
lief p{T-L) in hypothesis H is updated to give the posterior 
belief p(W\d). Bayes' theorem tells us that 



p(H\d) = 



p(d\H)p(H) 
p(d) 



(5) 



where p(d\H) is the (marginal) likelihood of observing the 
data d if the hypothesis holds, and p(d) is a normaliza- 
tion constant. For hypotheses described by models with 



continuous parameters, the likelihood p(d\%) is found by 
marginalizing the likelihood p(d\9,W) of observing data 
d for model parameters 9: 



P (d\H) = / d6 p(9,H)p(d\6,H), 



(0) 



where p(9, H) is the prior distribution of the parameters. 
The marginalized likelihood, p(d\H), is also known as the 
evidence for a given model. Hypotheses are compared by 
computing the odds ratio, or Bayes factor: 



p(Hi) P (d\Hi 



p(H \d) p(Ho)p(d\H ) ' 



(7) 



which gives the "betting odds" of Hi being a better de- 
scription of Nature than The normalization con- 
stant p(d) cancels in the odds-ratio. The prior odds ra- 
tio p(Hi)/p(Ho) gets updated by the likelihood ratio, 
p(d\'Hi)/p(d\'Ho), which is also known as the evidence 
ratio. In Bayesian analysis "today's posterior is tomor- 
row's prior" [43|], and p(H\d) is used in place of p{T~L) in 
subsequent analyses. While a single black hole inspiral 
event may not yield strong evidence for a departure from 
GR, several such observations can be combined to make 
a more compelling case. 

In addition to simply detecting deviations from GR, 
we are also interested in studying how departures from 
GR might affect parameter estimation. This can be as- 
sessed by looking at the posterior distribution function 
p(9\d, "H), which describes the probability distribution for 
parameters 9 under the assumption that the signals are 
described by model T-L given data d. The posterior dis- 
tribution is given by the product of the prior and the 
likelihood, normalized by the evidence: 



p{6\d,H) 



p{9,n)p{d\9,n) 

p(d\H) 



(8) 



Once the prior distribution and the likelihood function 
have been specified we are left with the purely mechanical 
task of computing the posterior distributions and odds 
ratio for competing hypotheses. 



B. Waveform Model 

The original ppE waveforms were for non-spinning, 
equal mass binaries in quasi-circular orbits, and included 
a description of the dominant harmonic through inspiral, 
merger and ringdown. In the current analysis we restrict 
our attention to the inspiral portion of the waveform, but 
our signals come from unequal mass binaries. We have 
examined the generalization of the ppE framework for un- 
equal mass systems, and find that for a single detection it 
is indistinguishable from the equal mass case. Including 
multiple detectors, and the merger and ringdown phases, 
which increase the signal-to-noise ratio, can help break 
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parameter degeneracies that exist in the inspiral phase, 
but these benefits come at the cost of having to consider 
additional ppE parameters. We will consider this in a 
separate publication. 

In the stationary phase approximation, our ppE wave- 
forms are parameterized as follows 



h(f) = h GR (f)[l + au a ]e l ^ b f<f a 



(9) 



where (a, a) are amplitude ppE parameters and (/3, b) are 
phase ppE parameters. As noted previously, both a and 
/3 can depend on the spin angular momenta and mass dif- 
ference of the two bodies, as well as the symmetric mass 
ratio of the system. With a single detection, however, 
these dependencies are impossible to determine, and so 
we defer an analysis of them to future work. Here /igr(/) 
is the usual GR waveform quoted in Eq. (TIJ . We set the 
maximum frequency cut-off at twice the innermost stable 
circular orbit frequency of a system described by GR. A 
more consistent choice would be to use the minimum of 
the ppE energy function, but the results were found to 
be fairly insensitive to the choice of / ma x- To simplify the 
analysis we restrict our attention to the lowest PN order 
in the amplitude of Eq. ([2]), setting j k = for k > 0. 
The GR phase terms in Eq. §5§ are kept out to k = 7. 
Furthermore, we limit the range of the ppE parameters a 
and b to not be greater than these corresponding highest 
order PN terms, namely a < 2/3 and b < 1. 1 

As discussed in the Introduction, the ppE framework 
introduces i sets of ppE theory parameters (an, a*, bi) 
that modify the amplitude and phase, but we here work 
to leading order, keeping only the i = set. This ap- 
proach will tend to over-estimate how well the ppE pa- 
rameters (ao) i0) A)> bo) = (a,a,/3, b) can be constrained 
by the data. A better approach, which we intend to pur- 
sue in future studies, is to marginalize over the higher 
order terms. 

Table I lists the leading ppE corrections that have been 
computed for several alternative theories of gravity. Gen- 
erally, the exponents a and b are pure numbers fixed by 
the theory, while the amplitudes a and (3 are free param- 
eters that relate to the unknown coupling strengths of the 
modified/additional gravitational degrees of freedom. 



C. Instrument Response 

The aLIGO /aVirgo analysis was performed using sim- 
ulated data from the 4 km Hanford and Livingston de- 



1 It is certainly conceivable that the leading order deviation arising 
from an alternative theory comes in at some high order, and has 
a much larger magnitude than the nearest exponent term in the 
PN expansion. Thus it is not a priori inconsistent to allow a 
range of exponents outside of that of the PN expansion used for 
the GR signal in the ppE waveforms, though this would require 
more complicated priors on the amplitudes, and so for simplicity 
in this study we restrict to the stated range. 



Theory 


a 




b 


B 


Brans-Dicke [9, 10, 14-161 







-7/3 


8 


Parity- Violation [22, 34-37] 


1 


a 







Variable G(t) [38] 


-8/3 


a 


-13/3 





Massive Graviton [8-141 







-1 





Quadratic Curvature [23, _44] 







-1/3 


P 


Extra Dimensions [45] 







-13/3 


P 


Dynamical Chern-Simons [461 


+3 


a 


+4/3 


P 



TABLE I: Leading ppE corrections in several alternative the- 
ories of gravity (GR corresponds to a = /3 = 0). In dy- 
namical Chern-Simons gravity, (a, B) are proportional to the 
spin-orbital angular momentum coupling. For non-spinning 
binaries, the last row would simplify to (a,B) = (0,0), but 
we include it here for completeness. 



tectors and the 3 km Virgo detector. The time delays 
between the sites and the antenna beam patterns were 
computed using the expression quoted in Ref . [47 1 . Since 
the detectors barely move relative to the source during 
the time the signal is in-band, the antenna patterns can 
be treated as fixed and the time delays At between the 
sites can be inserted as phase shifts of the form 2irfAt. 
For the instrument noise spectral density, we assumed all 
three instruments were operating in a wide-band config- 
uration with 



Sn(f) = 10 



-49 



111 



(2 - 2x 2 + x 4 ) 



(10) 

and x = (//215Hz). 

The space based (LISA) analysis was performed using 
the A and E Time Delay Interferometry channels |48j in 
the low frequency approximation p9l [Hoi ]. It is known 
that this approximation can lead to biases in some of the 
recovered parameters, such as polarization and inclina- 
tion angles. This, however, is an example of a model- 
ing bias introduced by inaccurate physical assumptions, 
and not of a fundamental bias resulting from incomplete 
knowledge of the theory describing gravity. In our cur- 
rent study the modeling bias is avoided by using the same 
low frequency response model to produce the simulated 
data and to perform the analysis. 

In contrast to the ground based detectors, the sig- 
nals seen by LISA are in-band for an extended period 
of time, and the motion of the detector needs to be 
taken into account. The time dependent phase delay 
between the detector and the barycenter and the time 
dependent antenna pattern functions are put into a form 
that can be used with the stationary phase approxima- 
tion waveforms by mapping between time and frequency 
using t(f) = (d$/df)/2Tr. Details of this procedure can 
be found in Ref. M- The noise spectral density model 
includes instrument noise and an estimate of the fore- 
ground confusion noise from unresolved galactic binaries, 
matching those quoted in Ref. (52j . 
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D. Likelihood Function 

Under the assumption that the noise is Gaussian, the 
likelihood that the data d would arise from a signal with 
parameters 9 is given by 



p(d\9) = Ce-* 2 ^/ 2 



(11) 



where C is a constant that depends on the noise level. 
Here 

X 2 ($) = (d-h(6)\d-h(6)), (12) 
and the brackets denote the noise weighted inner product 



(a\b) = 2 



a(f)b*(f) + a*(f)b(f) 

Sn(f) 



df • 



(13) 



For a theoretical study that assumes the noise is Gaus- 
sian and has a known spectrum, there is no need to add 
simulated noise to the data - the appropriate spread in 
the parameter values and overall topography of the likeli- 
hood surface follow from the functional form of the signal 
and the noise weighting in Eq. (|13p . Thus, we may write 
d = h{6') where 9' are the true source parameters. 

Many alternative theories of gravity predict the exis- 
tence of polarization states beyond the usual "plus" and 
"cross" polarizations of GR that complicate the treat- 
ment of the instrument response, whose Fourier trans- 
form is 

hinst = F + h + + F x h x + Fshs 

+F L h L + F vl h V i+F V 2h V2l (14) 

Here h +x are the usual plus and cross-polarization states, 
hs is a scalar (breathing) mode, hi is a scalar longitudi- 
nal model and hvi,V2 are two vectorial modes [53[, while 
the F's are the detector antenna patterns [HJ], which de- 
pend on the sky location (9, <j)) and polarization angle ip 
of the signal. 

To simplify the analysis we assume the usual polariza- 
tion content for a circular binary viewed at inclination 
angle I and neglect the other contributions: 



h+ = (l + cos 2 t)5i(/i) + 2cost3(/i) 
h x = (1 + cos 2 L)%(h) - 2 cos t9£(/i) 



(15) 



In other words, we have assumed that the signal in the 
detector has the form s(f) = F(8, 4>,ip, t) h(f) with the 
function F(6, <f>, ip, l) given by the usual GR expression. If 
additional polarization states were present, this assump- 
tion would result in a reduction in detection efficiency 
and biases in the recovery of the extrinsic parameters 
(6,cp,ip,L). 

The justification for making this simplification is that 
we are primarily interested in how well the intrinsic pa- 
rameters (a,a,/3,b) can be constrained, and we expect 
these parameters to be only weakly correlated with the 



extrinsic parameters. The presence of additional polar- 
ization states will provide an additional handle on de- 
tecting departures to GR [55T - [57| , and we plan to explore 
this possibility in the context of the ppE formalism in 
future work. 

Defining A+ = \F+h+{f;6)\ and A x = \F*h x (f;9)\, 
and similarly for 9' , the chi-squarcd goodness of fit of 
Eq. (fl~2|) can be re-expressed as 



X 2 {9) 



df 



Sn(f) 



At 



A'i + a: 



2(A+A' + + A X A' x )cosA* 
2(A X A'+ - A+A' x ) sin AV] , 



(16) 



where A* = *(0) - tf(0')- As noted in Ref. |5J], in 
the regime of interested where x 2 is small, all the terms 
in the above integrand are slowly varying functions of 
frequency, so it is possible to compute the likelihood very 
cheaply using an adaptive integrator. 



E. Priors 

As we shall see, the choice of priors on the ppE pa- 
rameters has a significant effect on the results, especially 
when it comes to model selection. The natural priors on 
the ppE parameters are those that come from existing 
data on binary pulsars, but these turn out to range from 
very restrictive to wide open depending on what sector of 
the ppE parameter space is being examined. To simplify 
the analysis we adopt uniform priors for the ppE parame- 
ters and seek to determine where direct GW observations 
would prove more constraining than the existing binary 
pulsar observations. 

The priors on the exponents a and b are taken to be 
uniform across the ranges a £ [—3, 2/3] and b £ [—4.5, 1]. 
The upper end of the range is chosen so that the ppE 
corrections to the amplitude and the phase do not go to 
higher order in the expansion parameter u than the post- 
Newtonian order of the reference GR waveforms. The 
lower end of the range is chosen to cover all known alter- 
native theories, though in any case, the low end of the 
range turns out to be far better constrained by binary 
pulsar observations. 

The priors on a, and (3 are more difficult to set. Lack- 
ing any theoretical or experimental guidance, we assign 
uniform priors for the amplitudes a, (3 £ [—1000,1000]. 
The range in a, (3 is set such that it is sufficiently large 
that at the most positive end of the prior ranges on a, 6, 
the exploration of possible values of a, f3 is not restricted 
by prior bounds. That is, even in the most poorly- 
constrained region of the ppE parameter-space, the con- 
straints are not due to an overly restrictive prior. 

The parameters used to describe the black hole binary 
were the log of the total mass M and the log of the chirp 
mass Ai, the sky location (cos#, <f>), orbital plane ori- 
entation (cos#l, (f>i), merger phase $ c , merger time t c , 
and luminosity distance Di. The angular parameters 
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are taken to have uniform priors that covered their nat- 
ural range. For the aLIGO studies, we assign uniform 
priors: ln(M/M ) G [1.3,5.3]; \n(M/M Q ) g [0.55,4.5]; 
t c /s g [1,16]; D L /Mpc g [0.1, 10 4 ]. For the LISA stud- 
ies, we assign uniform priors: ln(M/M Q ) g [12.2,16.8]; 
MM/M®) g [11.4,16]; t c /s g [1,6 x 10 7 ]; £> L /Gpc g 
[0.01,1000]. While we could use more physically moti- 
vated priors for the black hole parameters (such as dis- 
tance priors that scaled with L>|), these choices have lit- 
tle effect on the model comparison between GR and ppE 
waveforms. 



III. COMPUTATIONAL TECHNIQUES 

Posterior distribution functions for the alternative hy- 
potheses were computed using the Markov Chain Monte 
Carlo (MCMC) implementation described in Ref. [HJ, 
additionally enhanced by adding Differential Evolu- 
tion [59l . |60 ] to the mix of proposal distributions. The 
evidence for the competing hypotheses was calculated 
using the volume tessellation algorithm 16 if and cross- 
checked using thermodynamic integration [62j. 

The ppE waveforms introduce a number of compli- 
cations that make parameter estimation and model se- 
lection challenging. These complications can be seen 
when using the quadratic Fisher matrix approximation 
Tij = — did j (In p(6\d)} to estimate the parameter corre- 
lation matrix C ij = (Atf'A^) « rr. 1 . When evaluated 
at the GR limit point (a, 0) = (0,0), the quadratic ap- 
proximation to the Fisher matrix is singular, and it is 
necessary to include higher order derivatives to obtain 
a finite covariance matrix. The situation is worse when 
a = 0, as then a is fully degenerate with Dl, and when 
b = 0, as then /3 is fully degenerate with $ c . Partial de- 
generacies also exist whenever the a or b exponents match 
the exponents found in the post-Newtonian expansion of 
GR. 

The various degeneracies and parameter correlations 
do not constitute a fundamental problem with the ppE 
formalism, but they do demand that we use very effec- 
tive MCMC samplers that are able to fully explore the 
parameter space. The algorithm described in Ref. [42| 
uses parallel tempering with multiple, coupled chains, 
with each chain exploring a tempered likelihood surface 
p(d\9) 1 / T . The high temperature chains explore more 
widely, and can communicate this information via pa- 
rameter exchange to the T = 1 chain that is used for pa- 
rameter estimation. Parallel tempering helps the Markov 
chains explore complicated posterior distributions, but 
convergence can still be slow if the proposal distributions 
are not well chosen. 

The ultimate proposal distribution is the posterior dis- 
tribution itself, but since that is unavailable in advance, 
we have to make do with approximations to this ideal. 
The covariance matrix provides a local approxima- 
tion to the posterior distribution. It can be estimated 
semi-analytically using the Fisher information matrix, or 



more directly from the recent past history of the Markov 
chain itself. The latter approach introduces hysteresis 
into the chains, but so long as the covariance matrix is 
only updated occasionally the chains are asymptotically 
Markovian. In the present study, we continued to use 
the Fisher matrix based proposal distributions described 
in Ref. [13], but found that the convergence time of the 
chains was very long until we augmented these techniques 
with proposals based on Differential Evolution. 

Differential Evolution (DE) provides an approximation 
to the posterior distribution based on the past history 
of the chains. Unlike methods based on the covariance 
matrix, DE works extremely well with highl y co rrelated 
parameters. In its original formulation, DE [53 ] was de- 
signed to work with a population of TV parallel chains (all 
with temperature T = 1). The idea is very simple and 
can be coded in a few lines: Chain i is updated by ran- 
domly selecting chains j and k with j =/= k =/= i, forming 
the difference vector 8j — 6k and proposing the move 

m = 6i + i{e 3 - e) . (17) 

For Z)-dimensional multivariate normal distributions, the 
optimal choice for the scaling is 7 = 2.38/"v/2~D. Since the 
difference vector points along the D-dimensional error 
ellipse, the jumps are usually "in the right direction." 
It is a good idea to occasionally (e.g. 10% of the time) 
propose jumps with 7=1, which act as mode-hopping 
jumps when the samples (j, k) come from separate modes 
of the posterior. 

The original formulation of DE is not very practical 
since it requires ./V > 2D parallel chains for each rung on 
the temperature ladder. A more economical approach is 
to use samples from the past history of each chain [60| . 
It can be shown that this approach is asymptotically 
Markovian in the limit as one uses the full past history 
of the chain. We have implemented a variant of the DE 
algorithm as follows: 

• Create a history array for each parallel chain. 
Initialize a counter M. Store every 10 th sample in the 
history array and add to the counter each time a sample 
is added. DE moves are more effective if points during 
the burn-in phase of the search are discarded from the 
history array. 

• Draw two samples from the history array: j € [1 , M] , 
k g [1, M] and repeat if k = j. Propose the move to 

y = + 7(0; -0 fe ). (18) 

Here we draw 7 from a Gaussian of width 2.38/v2L> for 
90% of the DE updates and set 7 = 1 for the rest. 

The standard DE proposal seeks to update all the pa- 
rameters at once, but it is often more effective to update 
smaller sub-blocks of highly correlated parameters. We 
did this in ~ 30% of the DE proposals. 
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The fraction of all proposed moves that use DE is a 
tunable parameter. We used 60% DE proposals, 30% 
Fisher matrix based proposals, 5% draws from the prior 
distribution and 5% uniform draws with width ~ 10~ 6 
of the prior range. Notice that even though the Fisher 
matrix might be singular in certain regions of the pa- 
rameter manifold, one can still propose jumps with it. 
In those regions, the proposed jumps will not lead to a 
better likelihood, and will simply be rejected. 

With the mix of proposal distributions described 
above, and using ~ 10 parallel chains geometrically 
spaced with Tj+i = 1.3Ti, our MCMC implementa- 
tion converges quickly to a stationary distribution. The 
chains are typically run for 500,000 samples, with the 
first 100,000 discarded based on a conservative estimate 
of the burn-in length. 

The marginal likelihood, or evidence, p{d\T-L) is com- 
puted using independent codes supplied by Martin Wein- 
berg and Will Farr that implement Weinberg's volume 
tessellation algorithm (VTA) [Hj]. The VTA uses the 
posterior samples from the Markov chain to assign prob- 
ability to a partition of the sample space and performs 
the marginal likelihood integral directly. The samples are 
partitioned using a kd-tree, and volume elements con- 
taining m samples (we use to = 32 or m = 64) are used 
to provide a discrete approximation to the integral in 
Eq. ^ . The integrand in each volume clement is approx- 
imated using cither the average posterior density (Farr's 
code) or the median posterior density (Weinberg's code) 
of the to samples in the volume element. The VTA is 
applied to a sub-sample of the full chain, and by repeat- 
ing the calculation with different subsamplcs in a process 
called bootstrapping, it is possible to compute statistical 
errors bars on the evidence caused by using finite length 
Markov chains. 

There is a trade-off in the choice of the boxing number 
to, with large values of to providing better estimates of 
the average or mean posterior density in each cell, and 
small values of to providing better resolution to features 
in the posterior. In our experience, the statistical error 
found from the bootstrap procedure is usually smaller 
than the systematic error that we estimate by varying 
the boxing size from to = 16 to to = 64. 

As a cross check we applied thermodynamic integra- 
tion [62j to a few test cases using the implementation 
described in the appendix of Ref . [63| . In tests on distri- 
butions where the evidence can be calculated analytically, 
such as multi-variate Gaussians, wc found that thermo- 
dynamic integration gave more accurate results. On the 
other hand, thermodynamic integration requires many 
more chains (upwards of 50 for the ppE studies) and a 
careful tuning of the temperature ladder in order to re- 
solve the integrand. This tuning necessitates a long pilot 
run, or complicated adaptive tuning of the temperature 
ladder. So while thermodynamic integration produces 
more accurate results, it requires careful tuning and is 
far more computationally intensive. Based on the tests 
described in Appendix A, we estimate that the errors in 



the (natural) log Bayes factors computed using the VTA 
algorithm are of order ±2. 



IV. RESULTS 

We explore a range of questions concerning the ap- 
plication of the ppE formalism to detecting departures 
from GR using gravitational wave observations from both 
LISA and the three-detector network of aLIGO/aVIRGO 
interferometers. First, we derive simple estimates of how 
well the ppE parameters can be constrained by gravita- 
tional wave data by using ppE templates to detect GR 
signal injections. The spread in the recovered ppE pa- 
rameters establishes the range that is consistent with GR, 
and values outside of this range would point towards a de- 
parture from GR. We then compare these simple bounds 
to the more rigorous (and computationally expensive) 
bounds that can be derived from Bayesian model selec- 
tion. Finally, we explore how searching for gravitational 
waves using GR templates can lead to biases in the recov- 
ered parameters if Nature is described by an alternative 
theory of gravity. We find that these biases can become 
significant before the evidence disfavors GR. 



A. Cheap Bounds and Comparison with Pulsar 
Bounds 

The first question wc seek to address in this paper is 
how well the four ppE parameters (a, a, /3, b) can be de- 
termined. One approach to answering this question is to 
examine how a search using ppE templates would look 
when used to characterize a signal that is consistent with 
GR. That is, if the signal observed is described by GR to 
the given level of accuracy of our detectors, what values 
for the ppE parameters will be recovered from a search 
with ppE templates? Because we know that in GR the 
values of a and (3 should be for all values of a and b, 
we wish to determine the typical spread in the recovered 
value of (a, /3), centered at zero. The standard deviation 
in this spread then gives us a constraint on the magni- 
tude of the deviation that is still consistent with obser- 
vations, ie. deviations that are 'inside our observational 
error bars.' 

Cheap constraints will be defined as the (3<r)-bound 
on the posterior distribution of ppE parameters a or /?, 
while keeping a or b fixed and marginalizing over all other 
system parameters. These bounds are 'cheap' because 
we do not have to re-run a search with pure GR tem- 
plates and then compute the evidence, via integration 
of the posterior, to compute the Bayes factor (the latter 
is particularly computationally expensive). These cheap 
bounds are similar to constraints studied by looking at 
the (a, a) or (/3, 0) elements of the variance-covariance 
matrix. Our cheap constraints, however, are 3cr ones, in 
contrast to the more standard lcr bounds quoted from 
variance-covariance matrix studies. 
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FIG. 1: UPPER PANEL:Bounds on a for different values of 
a, found using two different aLIGO sources. The two sources 
had different mass ratios, total masses, and sky locations, 
but were scaled to have a network SNR of 20. The rough 
estimate for the a bound from equation (|2UI) is shown for 
comparison. Also included is the bound on a derived from 
the golden pulsar (PSR J0737-3039) data. 
LOWER PANEL: Bounds on a for different values of a, found 
using two LISA sources at redshift z = I and z — 3. The 
pulsar bound is shown for comparison. The sources injected 
had the same parameters as those from the lower panel in 
Figure [2] . 

Rough analytic estimates for the bounds on (a, (3) can 
be derived by considering how the the ppE terms affect 
the overall amplitude A and phase "J of the signal: 

A In -4 £Si 0(l4jn-»dx) 

A* ~ fluU-l&ax)- (19) 

Here u m i n and it max are the minimum and maximum 
values of the u parameter. For the aLIGO sources 
Mmin ~ 3 x 10~ 3 , while for the LISA sources u min ~ 
10~ 3 . The ISCO cut-off in the frequency evolution sets 
Wmax ~ 3 x 10~ 2 for moderate mass ratios. Combin- 
ing these estimates with a crude Fisher matrix estimate 
for how well the amplitude and phase are constrained: 
A In A ~ A* ~ 1/SNR yields the 3cr bounds 



" SNR|< lin -< lax | 

^ ~ SNR|< in -<J- 

These estimates reproduce the overall shape of the exclu- 
sion plots in the (a, a) and (b, j3) planes, but they tend 
to over estimate the strength of the bounds as they do 
not take into account covariances with other parameters. 
The a bounds turn out to be a factor of ~ 10 weaker due 
to covariances between a and the distance and inclina- 
tion, while the bounds on f3 come out a factor of ~ 100 
weaker due to covariances between (3 and the chirp mass 
and mass ratio. 



Figures Q] and [2] show these cheap constraints on the 
ppE amplitude parameters as a function of the exponents 
a and b for a variety of aLIGO / aVirgo and LISA detec- 
tions. To generate these plots, we injected GR signals 
and then searched on them with ppE templates. For each 
search, either a or & was held fixed at a specific value, 
while the other three ppE parameters (and all other sys- 
tem parameters) were allowed to vary. We then calcu- 
lated the standard deviation of the posterior distribution 
of the relevant amplitude parameter a or ft, and used 
three times this value as the cheap bound shown on the 
plots. 

A natural course of action might seem to be the follow- 
ing: marginalize over a and b as well, instead of keeping 
them fixed, and calculate constraints on a and j3 this way. 
Looking at Figures [2] and [TJ however, show why this anal- 
ysis would not be particularly helpful. The uncertainty 
in a and j3 is so much higher at the positive ends of the 
prior ranges on a and b than at the negative ends that the 
Markov chains would spend almost all of their iterations 
exploring this area of parameter space if a and b were al- 
lowed to change. Thus, to get any knowledge about the 
uncertainties in a and ft for negative values of a and b, 
we need to fix a and b. 

The aLIGO systems were chosen to have network 
SNR = 20, but different masses and sky locations. One 
system had masses mi = 6M@, mi = 18M© (77 = 
0.1875), D L = 258 Mpc, while the other had m x = 6M , 
m 2 = 12M Q (77 = 0.2222), D L = 462 Mpc. The 
LISA sources were at different redshifts and had differ- 
ent masses and SNRs. The system at redshift z = 1 
had mi = 1 x 10 6 M Q , m 2 = 3x 10 6 M Q (77 = 0.1875) 
and SNR = 879, while the system at redshift z = 3 had 
mi = 2 x 10 6 A/ Q , m 2 = 3 x 10 6 M© (77 = 0.24) and 
SNR = 280. 

Figures [T][2] are 'exclusion' plots, showing the region 
(above the curves) which could be excluded with a 
99.73% confidence. These figures also plot the bound 
on the ppE parameters that have already been achieved 
through analysis of the 'golden pulsar' system, PSR 
J0737-3039 [41(. Observe that for the amplitude pa- 
rameter a, the pulsar bounds beat the aLIGO bounds 
through almost the entire range of a; LISA can improve 
upon the pulsar bounds for a > 0. For the phase parame- 
ter f3, however, both aLIGO and LISA do better than the 
pulsar analysis through a significant portion of the range. 
As expected, gravitational wave observations tend to do 
better in the strong field regime, corresponding to high 
post-Newtonian terms (6 > —5/3 and a > 0), while the 
reverse is true for binary pulsar observations. 

Vertical lines in Figs. [T] and [2] can be mapped to bounds 
on specific alternative theories, which we can then com- 
pare to current Solar System constraints. For example, 
consider the following cases: 

• Brans-Dicke [(a,6,/3 BD ) = (0, -7/3 J^ D )]: The 
tracking of the Cassini spacecraft [64[ has con- 
strained uj bd > w B d = 4 x 10 3 , which then forces 
/3 BD < (5/3584)4" 2/5 (si -s 2 ) 2 /o)bd, where s il2 are 
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FIG. 2: UPPER PANEL: Bounds on /3 for different values of 
b for a single SNR = 20 aLIGO/aVirgo detection. Plotted 
here is a (3<r) constraint, where a is the standard deviation 
of the /3 parameter derived from the Markov chains. The 
sources injected had the same parameters as those from the 
upper panel in Figure \T\ Also included is the bound on 
/3 derived from the golden pulsar (PSR J0737-3039) data, as 
well as bounds found from solar system experiments and other 
aLIGO analyses for massive graviton theory. 
LOWER PANEL: Bounds on /3 for different values of b found 
using two LISA sources at redshift z = 1 and z — 3. The 
pulsar bound is shown for comparison, as well as bounds found 
from solar system experiments and other LISA analyses for 
massive graviton theory. These other bounds are scaled to a 
system with z = 1. 



the sensitivities of the binary components (for BHs 
,s BH = 1/2, and for NSs s NS « 0.2 - 0.3). 

Massive Graviton [(a, b,/3 MO ) = (0, —1, B Ma )]: Ob- 
servations of Solar system dynamics [65( have con- 
strained A M g > A MG = 2.8 x 10 12 km, which then 
forces /3 M g < tr 2 (D/\ MG )M(l + km -2 , where 
D is a distance measure to the source M. 



with a black circle 2 . Observe that the constraints we 
could place with aLIGO and particularly LISA can be 
orders of magnitude stronger than Solar System con- 
straints (below the black circle). This is more easily 
seen by mapping our projected constraints on (3 MG to 
constraints on A MG ; with the aLIGO source, we find 
Amg < 8.8 x 10 12 km, while for the LISA source, we find 
Amg is 3.763 x 10 16 km. This is consistent with results 
from previous Fisher pj-fpol] and Bayesian studies (ioj . 
Plotted for comparison are the bounds from Pozzo et al. 
40j on the upper panel of [5] and from Stavridis and Will 
ll| on the lower panel of [5] both labeled as "massive 
graviton." We find that our bound on /3 for b = —1 is 
quite comparable to those found in these previous stud- 
ies. Finally, shown on the lower panel of Fig. [5] are the 
bounds found in the study by Arun et al. [12j . which al- 
lowed the PN coefficients themselves to vary as parame- 
ters. Their bounds on /? are somewhat weaker than those 
we found in our analysis, but this is an expected effect of 
the covariance between the PN coefficients. 

For all comparisons with previous studies, we took into 
account differences in SNR between the systems we an- 
alyzed and those we were comparing to. We also chose 
systems with the same or very similar total masses and 
mass ratios as those explored in previous papers. For 
the LISA systems, we compare the results from previous 
papers to our results for redshift z = 1. 

These plots show several other features that deserve 
further discussion. First, observe that all results show 
very little dependence on the choice of system parame- 
ters. This is quantitatively true for the aLIGO sources, 
shown in the upper panels of Figs. [2] and [TJ as these sig- 
nals have the same SNR. The LISA sources, shown in 
the lower panels of Figures [2] and [TJ show a factor of ~ 9 
offset, since these curves correspond to signals with dif- 
ferent SNRs. The SNR difference is a factor of ~ 3, which 
is a bit surprising as one would expect the spread on a 
parameter to scale with the SNR, and not the square 
of the SNR. However, we are working here in a region 
where the quadratic approximation to the Fisher matrix 
is singular, so the usual scaling does not hold. The more 
rigorous bounds derived in the next section do follow a 
linear scaling with SNR, which is reasonable since they 
use ppE injections and have non-singular Fisher matrix 
elements for the ppE parameters. 

Another interesting feature in these plots are the spikes 
at certain values of a and b. These spikes say that for 
those values of a and b, gravitational wave observations 
can say little about the magnitude of GR deviations. The 
reason for such spikes is that for those values of a and b, 
a and j3 become completely or partially degenerate with 
other parameters. For instance, when a = 0, a is fully 



2 We don't show similar constraints for Brans-Dicke theory, as 
here we consider binary BH inspirals, for which the Brans-Dicke 



The Solar System constraint On /3mG is shown in Fig. [2] correction would vanish due to the no-hair theorem. 
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degenerate with the luminosity distance, and when 6 = 0, 
P is fully degenerate with the initial orbital phase <f> c . 

One can also develop 'cheap' bounds that use ppE in- 
stead of GR signal injections. For instance, one could 
start with injections with a range of values for a and /3, 
and then look to see when the posterior distributions for 
these parameters no longer show significant support at 
the GR values of a = /3 = 0. These two types of cheap 
bounds are illustrated in Fig. [3l Given an observation of 
a non-zero a, a cheap bound calculation as described in 
this section (solid curve) would indicate a value \a\ < 1.5 
is still consistent with GR. A similar study with ppE in- 
jections, however, which produced the dashed-curve pos- 
terior distribution for a would indicate a preference for 
the ppE model over the GR model with a detection of 
a > 0.75. Thus the technique used in this section, which 
is a variance-covariance study, answers an inherently dif- 
ferent question from a model selection study. In the next 
section, we explore model selection in detail. 

1.8 i 1 1 1 1 1 1 1 1 

1.6 - 

1.4 - / \ 

1-2 - ; \ 

1 - ■ \ 




FIG. 3: An illustration of the two approaches for calculating 
cheap bounds on the ppE amplitude parameters. The solid 
curve illustrates the bound that can be derived by looking at 
the spread in the amplitude a when applying the ppE search 
to GR signals. In this example, values of \a\ > 1.5 would be 
taken as indicating a departure from GR. The dashed curve 
shows the bound that can be derived by starting with ppE 
signals and determining how large the ppE amplitude needs 
to be for the posterior distribution to have little weight at the 
GR value of a = 0. In this example, theories with a > 0.75 
would be considered distinguishable from GR. 



B. Rigorous Bounds and Model Selection 

In order to see how accurate the cheap bounds found 
in the previous section are, we next performed a full 
Bayesian model selection analysis on several different sig- 
nals. We injected a signal with a given set of ppE param- 
eters and ran a search using both GR and ppE templates. 
We then calculated the Bayesian evidence for each model 
and from this the Bayes factor. To compare these results 



to the cheap bounds, we ran the analysis on several dif- 
ferent ppE signals, each with the same injected value of a 
or b, but with progressively larger values of a or /3. This 
then allows us to determine the values of ppE amplitudes 
a or /3 where the evidence for the ppE hypothesis exceeds 
that of the GR hypothesis by some large factor, which we 
took to be Bayes factors in excess of 100 (in the Jeffery's 
classification [66|, Bayes factors in excess of 100 represent 
decisive evidence in favor of that model). 

We do not expect the cheap bounds to agree precisely 
with the more rigorous model selection bounds as they 
are based on quite different reasoning. The cheap bounds 
simulate what we would find if GR was consistent with 
observations, and establishes the spread in the ppE am- 
plitude parameters that would remain consistent. If we 
were to analyze some data and find ppE amplitude pa- 
rameters outside of this range, it would give us motiva- 
tion to search more rigorously for departures from GR. 
With the more expensive model selection bounds, we 
start with non-GR signals and seek to determine how 
large the departures from GR have to be for the ppE hy- 
pothesis to be preferred. In the first case the distribution 
of a and (3 is known to be centered around zero, but in 
the second case they are not, so the two analyses should 
not be expected to agree precisely. 

One can derive a more detailed connection between 
the alternative form of the cheap bounds derived us- 
ing ppE injections (discussed at the end of the previ- 
ous section) and the more rigorous Bayesian evidence 
calculations using the Savage-Dickey density ratio [o7| . 
The latter states that for nested hypotheses with sepa- 
rable priors, the Bayes factor is equal to the ratio of the 
posterior and prior densities evaluated at the parameter 
values that correspond to the lower dimensional model. 
If the posterior distribution was a Gaussian with width 
a centered at a = na, and we were using a uniform 
prior with width Ncr, then the Bayes factor would equal 
BF = Ne~ n / 2 /\/27r, where this Bayes factor shows the 
odds of the lower dimensional model being correct. For 
example, with N = 100 and n = 4 we get a Bayes factor 
of BF = 0.013, showing strong support for the higher di- 
mensional model. While the cheap bounds that can be 
derived using ppE signal injections will be stronger than 
the cheap bounds that can be derived from GR signal 
injections, the computational cost is higher as multiple 
simulations have to be run to find the transition point, 
and this approach is only moderately cheaper than per- 
forming the full Bayesian model selection. 

Examples of the full model selection procedure are 
shown in Fig. @] for aLIGO/aVirgo detections with 
SNR = 20. Each panel shows Bayes factors for two types 
of ppE search, one with a or & held fixed at the injected 
value, and one in which all four ppE values were allowed 
to vary. The Bayes factor, defined in Eq. ©, is here the 
odds ratio between the ppE model and the GR model. A 
larger Bayes factor indicates a stronger preference for the 
ppE model. The search in which a or 6 was fixed pro- 
vides the closest comparison with the cheap bounds of 
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FIG. 4: UPPER PANEL: Bayes factors for a SNR = 
20 aLIGO ppE injection with parameters (a, a, b, j3) = 
(0, 0, —1.25, 0). The Bayes factors are the 'betting odds' that 
ppE (and not GR) is the model that accurately describes the 
data. As the deviation from GR gets larger, ppE becomes the 
preferred model. 

LOWER PANEL: Bayes factors for a SNR = 20 aLIGO ppE 
injection with parameters (a,a,b,/3) = (—0.5, a, 0, 0). 



the previous section. The bound on (3 derived by setting 
a Bayes factor threshold of 100 arc roughly 3 times larger 
than the cheap bounds when b is held fixed and roughly 
2 times larger when b is free to vary. The bounds on 
a match the cheap bounds when a is held fixed, and is 
slightly smaller when a is allowed to vary. 

We were surprised to find that the bounds are tighter 
for the higher dimensional models, with (a, b) free, than 
for the lower dimensional models, with (a, b) fixed. To ex- 
plore this more thoroughly, we performed a study where 
the prior on b was increased from a very small range to 
the full prior range. Since holding a parameter fixed is 
equivalent to using a delta-function prior, we expect the 
evidence to interpolate between the values found when b 
was fixed and when b was free to explore the full prior. 
Figure [5] confirms this expectation, and also provides an 
explanation for the growth in the evidence. 



To understand this plot, it is helpful to look at the 
Laplace approximation to the evidence [68^ . which as- 
sumes that the region surrounding the maximum of the 
posterior distribution is well approximated by a multi- 
variate Gaussian. With this assumption, the evidence is 
given by 



p(d\H)^ P (d\0,H)\ 



MAP 



H 



V H 



(21) 



The first term is the likelihood evaluated at the maxi- 
mum of the posterior, and the second term is the ratio 
of the posterior volume AV to the prior volume V. The 
posterior volume can be estimated from the volume of 
the error ellipsoid containing 95% of the posterior prob- 
ability. The ratio O = AV jV is termed the "Occam 
factor", and the quantity I = \og 2 {V/AV) provides a 
measure of how much information has been gained about 
the parameters from the data. 

Now consider a situation where we have nested hy- 
potheses "Ho and Hi, with the second hypothesis involv- 
ing an additional parameter y. If the likelihood is insen- 
sitive to y then the first factor in the evidence stays the 
same, and since y is unconstrained, AV y — V y and the 
Occam factor is also unchanged. Thus, both models have 
the same evidence, even though one has more parameters 
than the other. Conversely, if the additional parameter 
is tightly constrained by the data, can be a very 
small number. In this case, the evidence for Hi is much 
reduced by the Occam factor, and the factor is referred 
to as an "Occam penalty." 

The growth in evidence for the ppE model as the prior 
range for b gets larger is an effect of this Occam fac- 
tor, which is a ratio of the uncertainty in the recovered 
value of an extra parameter to the prior volume for that 
parameter. As the prior range on b expands, this leads 
to a greater variance in the recovered values for j3'. Be- 
cause the prior volume of j3' remains unchanged, the large 
growth in its variance as the prior range of b is expanded 
leads to a large growth in the Occam factor - and thus 
a shrinking of the Occam penalty. As the Occam factor 
gets larger, so does the evidence for the ppE model. The 
evidence for the GR model, of course, does not depend 
on the priors we use for the ppE parameters, and so as 
the evidence for ppE grows, the Bayes factor indicates a 
stronger preference for ppE. 

Figure [6] shows Bayes factors between the GR and ppE 
hypotheses for a z — 1 LISA source. In the upper panel, 
the injections where chosen with a = 0, b = — 1 and vari- 
able /3, while in the lower panel the injections were cho- 
sen with a = 0.5,6 = and variable a. Because LISA 
sources have much higher SNR, the ppE parameters are 
more tightly constrained, and the difference between the 
Bayes factors when a or b are fixed versus freely varying 
is less pronounced. The more rigorous bounds on a and 
j3 are both a factor of ~ 2 times weaker than those pre- 
dicted by the cheap bounds, which is in line with what we 
found for the phase correction j3 in the aLIGO example. 
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FIG. 5: Here we plot the log of the evidence (E) for the ppE 
model characterizing a ppE injection as the prior volume on b 
is increased. The evidence for the ppE model increases with 
the prior volume on b. The growth in the evidence can be 
attributed to the growth in the variance of /3, which lessens the 
severity of the 'Occam penalty' for more model parameters. 



In summary, the cheap bounds provide a fair approxi- 
mation to the bounds that can be derived from Bayesian 
model selection, and can generally be trusted to within 
an order of magnitude. 



C. Fitting Factor 

Another quantity of interest is the fitting factor, which 
measures how well one template family can recover an 
alternative template family. To define the fitting factor, 
we must first define the match between two templates h 
and h! as 



M 



(h\h>) 



(22) 



The match is related to the metric distance between tem- 
plates (r39| by M = 1 — ^ 9% j Ax 1 Ax^ , where the metric is 
evaluated with the higher-dimensional model (appropri- 
ate when dealing with nested models) . The fitting factor 
FF is then defined as the best match that can be achieved 
by varying the parameters of the h' template family to 
match the template belonging to the the other family, h. 

Another interpretation for the fitting factor is as 
the fraction of the true signal-to-noise ratio SNR = 
\J (h\h) that is recovered by the frcqucntist statistic 
p = max{(h\h')/ y/ {h'\h')\. The imperfect fit leaves be- 
hind a residual [h — h') with SNR 2 CS = x 2 , which can be 
minimized by adjusting the amplitude of h! to yield 



SNR 2 CS = (1 - FF 2 )SNR 2 



(23) 



ID 
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FIG. 6: UPPER PANEL: Bayes factors for a z = 1 LISA ppE 
injection with parameters (a, a, b, j3) = (0, 0, —1.0, 0). 
LOWER PANEL: Bayes factors for a z = 1 LISA ppE injec- 
tion with parameters (a, a, b, j3) — (0.5, a, 0, 0). 



working in the limit where FF ~ 1, we have 



1 - FF 



SNR, 
2 SNR 2 



(24) 



We see then that the ability to detect departures from 
GR scales inversely with the square of the SNR, as given 
by Eq. (|24|) . On the other hand, the detectable difference 
between the parameters in the two theories will scale in- 
versely with a single power of the SNR. This is because 
this detectable difference is proportional to the square- 
root of the minimized match function and 



min(gij Ax 1 Ax J ') 



SNR, 
SNR 



(25) 



Assuming that a residual with SNR, is detectable, and 



and the metric is independent of SNR. This reasoning 
applies to both the additional model parameters of the 
alternative theory, e.g. Ax 1 = (a,/?), and the physical 
source parameters such as the masses and distance. We 
then expect both the bounds on the ppE model param- 
eters and the biases caused by using the wrong template 
family to scale inversely with SNR. This scaling is in 
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keeping with the usual scaling of parameter estimation 
errors that follows from a Fisher matrix analysis where 
(A^AaJ) ~ (MM -1 ~ SNR" 2 . Figure shows that 
the errors in the recovery of the ppE parameters follows 
the expected scaling with SNR. 



actual values 
1/SNR 



SNR 



FIG. 7: The scaling of the parameter estimation error in the 
ppE parameter j3 for an aLIGO simulation with ppE param- 
eters (a,a,b, /?) = (0,0,-1.25,0.1). The parameter errors 
follow the usual 1/SNR scaling. 



that the log Bayes factor is equal to 

e - x 2 m)/2 Qi 



logBF = log 



Y 2 • 
Amm 



= (1 - FF 



e -X 2 {U )/2 0q 

A log O 
SNR 2 



AlogO, (26) 



where O is the Occam factor, defined in the discussion 
following [Eq. ([2~T]) ]. Thus, up to the difference in the log 
Occam factors, the log Bayes factor should scale as 2(1 — 
FF) when FF ~ 1. This link is confirmed in Figure [5] 



D. Parameter Biases 

If we assume that Nature is described by GR, but in 
truth another theory is correct, this will result in the 
recovery of the wrong parameters for the systems we are 
studying. For instance, when looking at a signal that 
has non-zero ppE phase parameters, a search using GR 
templates will return the incorrect mass parameters, as 
illustrated in Fig. |H1 Observe that as the magnitude of (3 
is increased (thus increasing the Bayes factor), the error 
in the chirp mass parameter extraction grows well beyond 
statistical errors. 



Bayes Factor • 
Fitting Factor ■ 



CO 
CD 



P 



FIG. 8: The log Bayes factors and (1 — FF) plotted as a 
function of /3 for a ppE injection with parameters (a, a, b, fi) = 
(0,0, — 1.25, P). The predicted link between the fitting factor 
and Bayes factor is clearly apparent. 



Alternative models that are not well-fitted by GR will 
be more easily distinguished than models that can be 
well-fitted. This suggests that there should be a correla- 
tion between the fitting factor and the Bayes factor. The 
relationship can be established using the Laplace approx- 
imation to the evidence [Eq. (|21[) ]. from which it follows 




BF = 5.6 
P = 5 
injected vail 




2.85 2.9 
ln(M) 



2.8 2.85 
ln(M) 




BF = 3300 




P = 20 




~ 1 


1 



2.65 2.7 2.75 2.8 2.85 2.9 2.95 

ln(M) 



2.4 2.5 2.6 



2.7 2.8 2.9 

ln(M) 



FIG. 9: Histograms showing the recovered log total mass for 
GR (dashed) and ppE (solid) searches on ppE signals. As 
the source gets further from GR, the value for total mass 
recovered by the GR search moves away from the actual value. 
All signals had injected b = —0.25. 

Perhaps the most interesting point to be made with 
this study is that the GR templates return values of the 
total mass that are completely outside the error range 
of the (correct) parameters returned by the ppE search, 
even for ppE signals that are not clearly discernible from 
GR. We refer to this parameter biasing as 'stealth bias', 
as it is not an effect that would be easy to detect, even 
if one were looking for it. 
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As an example, consider stealth bias for non-zero ppE 
a parameters, as illustrated in Fig. [TDJ As one would 
expect, when a GR template is used to search on a ppE 
signal that has non-zero ppE amplitude corrections, the 
parameter that is most affected is the luminosity dis- 
tance. We again see the bias of the recovered parameter 
becoming more apparent as the signal differs more from 
GR 3 . For example, the recovered posterior distribution 
from the search using GR templates has zero weight at 
the correct value of luminosity distance when the Bayes 
factor is ~ 50. Even when the Bayes factor is of order 
unity, the peaks of the posterior distributions of the lu- 
minosity distance differ by approximately 10 Gpc. 
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FIG. 10: Histograms showing the recovered values for lumi- 
nosity distance from GR and ppE searches on a LISA binary 
at redshift z = 7. Both signals have a = 0.5, and were in- 
jected with a luminosity distance of 70.5 Gpc. The top plot 
has a = 3.0 and the bottom has a = 2.5. As the Bayes factor 
favors the ppE model more strongly, the bias in the recov- 
ered luminosity distance from the GR search becomes more 
pronounced. 



V. CONCLUSION 

The two main results of this study are that GW ob- 
servations of binary compact object inspirals using ppE 
waveforms can constrain higher PN order (i.e. b > —5/3 
and a > 0) deviations from GR much more tightly than 



3 Here, the uncertainty in the recovered luminosity distance 
changes considerably between the different systems, because we 
held the injected luminosity distance constant instead of the in- 
jected SNR. 



binary pulsar observations, and that parameter estimates 
can be significantly biased if GR templates are used to 
recover signals when an alternative theory of gravity bet- 
ter describes the event. This latter bias can be signifi- 
cant even in cases where it is not obvious that GR is 
not quite the correct theory of gravity. We also see that 
the detection efficiency of GR templates can be seriously 
compromised if they are used to characterize data that 
is not described by GR. 

The current study makes several simplifying assump- 
tions about the waveforms: we consider only the inspi- 
ral stage for non-spinning black holes on circular orbits, 
and include just the leading order ppE corrections to the 
waveforms. In future work we plan to include a marginal- 
ization over these higher order corrections. Including this 
marginalization will be more realistic, as the ppE for- 
malism allows for many higher order corrections to the 
waveform. Marginalizing over the higher order terms will 
weaken the bounds on the leading order ppE parame- 
ters, though probably not by that much since they are 
sub-dominant terms. 

Another subject that we will examine in the future is 
the effect on our analysis of multiple detections. Simul- 
taneously characterizing several systems with different 
mass ratios should allow us to examine the dependence of 
the a/ /3 parameters on spin, mass difference, mass ratio, 
etc.. Furthermore, looking at several systems simultane- 
ously will break the degeneracies between the ppE pa- 
rameters and the individual system parameters (masses, 
distances etc), and will allow us to detect significantly 
smaller deviations from GR. 

We also plan to perform a study similar to that done 
by Arun et al. [3ll433l |. in which the exponents a^, bi are 
fixed at the values found in the PN expansion of GR, and 
compare their Fisher matrix based bounds to those from 
Bayesian inference. We expect a full Bayesian inference 
study to lead to significantly different conclusions, due to 
the singularities in the Fisher matrix already observed in 
the present study. 

Finally, we will look at LISA observations of galactic 
white-dwarf binaries to see if the brighter systems, which 
may have SNRs in the hundreds, may allow us to beat 
the pulsar bounds across the entire ppE parameter space. 
The brightest white-dwarf systems will have u ~ 10 -8 — ¥ 
10 -7 (for comparison the 'golden' double pulsar system, 
PSR J0737-3039A has u = 3.94 x 10~ 9 ), and these small 
values for u make the ppE effects, which scale as u a and 
u b , much larger than for black hole inspirals when a,b < 
0. 

The chance to test the validity of Einstein's theory 
of gravity is one of the most exciting opportunities that 
gravitational wave astronomy will afford to the scientific 
community. Without the appropriate tools, however, our 
ability to perform these tests is sharply curtailed. This 
analysis has shown that the ppE template family could 
be an effective means of detecting and characterizing de- 
viations from GR, and also that assuming that our GR 
waveforms arc correct could lead to lessened detection cf- 
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ficiency and biased parameter estimates if gravity is de- 
scribed by an alternative theory (even when choosing pa- 
rameters at the threshold of what has already been ruled 
out by Solar System and binary pulsar observations) . We 
have identified several of future investigation, and 

will continue to study this area in depth. 
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VI. APPENDIX A 

As described in Section III, the VTA method for cal- 
culating evidences involves two possible sources of error. 
One is introduced by the fact that our Markov chains are 
of finite length. To get an idea of the magnitude of this 
statistical uncertainty, the implementation of the VTA 



that we used calculates the evidence many times using 
different sub-samples of the Markov chain. This process 
is called bootstrapping, and we find that in general it 
results in an uncertainty in the log Bayes factor of the 
order ±0.5. 

The second source of possible error in the VTA tech- 
niques comes from the choice of boxing number. The box- 
ing number is the number of points from the chain that 
are sorted into each volume element. A higher boxing 
number will return a more accurate number for the mean 
or median of the posterior in a given volume element, but 
at the cost of having large volume elements that may not 
resolve fine features in the posterior distribution. Lower 
boxing numbers lead to greater variance in the estimate 
of the posterior density in each cell, but allows for better 
resolution of sharp features in the posterior landscape. 
Jb examine the systematic error in Bayes factors associ- 
ted with using different boxing numbers, we calculated 
the Bayes factor between ppE and GR models for a source 
with injected ppE parameters (a, a, b, (i) = (0.5, 75, 0, 0). 
We first used thermodynamic integration with a run us- 
ing 50 chains, and found the log Bayes factor to equal 
log(B) = 12.0±1.0. Because thermodynamic integration 
performs more accurately than the VTA when integrat- 
ing posterior distributions for which analytic answers are 
available, such as a multi-variate Gaussian, we take this 
value as our reference. We then calculated the log Bayes 
factor using the VTA with boxing numbers of 16, 32, and 
64. The results, including the statistical uncertainty, are 
shown in Table II. 



TABLE II: Bayes factors calculated using the VTA with dif- 
ferent boxing numbers. 



Boxing Number 


GR 


ppE 


log(BF) 


16 

32 
64 


-41.50:^ 2 
-40.43±g:i* 
-39.51±8;if 


-31.04+^ 
-28.02±°-£ 

9 r 7 o+0.43 
zd - '"-0.30 


io.5±i;i! 

12.4t°;g 
13.718;? 



The results show that the variation in log(-B) between 
different boxing sizes is similar to, but slightly larger than 
the statistical variation introduced by the VTA within 
one boxing size. The variation due to choice of boxing 
size is roughly ±1.5. We therefore use error bars indi- 
cating log(-B) ± 1 on our Bayes factor plots. Further, we 
found that a boxing size of 32 returned the most accurate 
value for the Bayes factor, and so we used this size for 
the rest of our analysis. 



